4 research outputs found

    Improving the User Experience of the rCUDA Remote GPU Virtualization Framework

    Get PDF
    Graphics processing units (GPUs) are being increasingly embraced by the high-performance computing community as an effective way to reduce execution time by accelerating parts of their applications. remote CUDA (rCUDA) was recently introduced as a software solution to address the high acquisition costs and energy consumption of GPUs that constrain further adoption of this technology. Specifically, rCUDA is a middleware that allows a reduced number of GPUs to be transparently shared among the nodes in a cluster. Although the initial prototype versions of rCUDA demonstrated its functionality, they also revealed concerns with respect to usability, performance, and support for new CUDA features. In response, in this paper, we present a new rCUDA version that (1) improves usability by including a new component that allows an automatic transformation of any CUDA source code so that it conforms to the needs of the rCUDA framework, (2) consistently features low overhead when using remote GPUs thanks to an improved new communication architecture, and (3) supports multithreaded applications and CUDA libraries. As a result, for any CUDA-compatible program, rCUDA now allows the use of remote GPUs within a cluster with low overhead, so that a single application running in one node can use all GPUs available across the cluster, thereby extending the single-node capability of CUDA. Copyright © 2014 John Wiley & Sons, Ltd.This work was funded by the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. The author from Argonne National Laboratory was supported by the US Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357. The authors are also grateful for the generous support provided by Mellanox Technologies.Reaño González, C.; Silla Jiménez, F.; Castello Gimeno, A.; Peña Monferrer, AJ.; Mayo Gual, R.; Quintana Ortí, ES.; Duato Marín, JF. (2015). Improving the User Experience of the rCUDA Remote GPU Virtualization Framework. Concurrency and Computation: Practice and Experience. 27(14):3746-3770. https://doi.org/10.1002/cpe.3409S374637702714NVIDIA NVIDIA industry cases http://www.nvidia.es/object/tesla-case-studiesFigueiredo, R., Dinda, P. A., & Fortes, J. (2005). Guest Editors’ Introduction: Resource Virtualization Renaissance. Computer, 38(5), 28-31. doi:10.1109/mc.2005.159Duato J Igual FD Mayo R Peña AJ Quintana-Ortí ES Silla F An efficient implementation of GPU virtualization in high performance clusters Euro-Par 2009 Workshops, ser. LNCS, 6043 Delft, Netherlands, 385 394Duato J Peña AJ Silla F Mayo R Quintana-Ortí ES Performance of CUDA virtualized remote GPUs in high performance clusters International Conference on Parallel Processing, Taipei, Taiwan 2011 365 374Duato J Peña AJ Silla F Fernández JC Mayo R Quintana-Ortí ES Enabling CUDA acceleration within virtual machines using rCUDA International Conference on High Performance Computing, Bangalore, India 2011 1 10Shi, L., Chen, H., Sun, J., & Li, K. (2012). vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines. IEEE Transactions on Computers, 61(6), 804-816. doi:10.1109/tc.2011.112Gupta V Gavrilovska A Schwan K Kharche H Tolia N Talwar V Ranganathan P GViM: GPU-accelerated virtual machines 3rd Workshop on System-Level Virtualization for High Performance Computing, Nuremberg, Germany 2009 17 24Giunta G Montella R Agrillo G Coviello G A GPGPU transparent virtualization component for high performance computing clouds Euro-Par 2010 - Parallel Processing, 6271 Ischia, Italy, 379 391Zillians VGPU http://www.zillians.com/vgpuLiang TY Chang YW GridCuda: a grid-enabled CUDA programming toolkit Proceedings of the 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA), Biopolis, Singapore 2011 141 146Barak A Ben-Nun T Levy E Shiloh A Apackage for OpenCL based heterogeneous computing on clusters with many GPU devices Workshop on Parallel Programming and Applications on Accelerator Clusters, Heraklion, Crete, Greece 2010 1 7Xiao S Balaji P Zhu Q Thakur R Coghlan S Lin H Wen G Hong J Feng W-C VOCL: an optimized environment for transparent virtualization of graphics processing units Proceedings of InPar, San Jose, California, USA 2012 1 12Kim J Seo S Lee J Nah J Jo G Lee J SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters Proceedings of the 26th International Conference on Supercomputing, Venice, Italy 2012 341 352NVIDIA The NVIDIA CUDA Compiler Driver NVCC Version 5, NVIDIA 2012Quinlan D Panas T Liao C ROSE http://rosecompiler.org/Free Software Foundation, Inc. GCC, the GNU Compiler Collection http://gcc.gnu.org/LLVM Clang: a C language family frontend for LLVM http://clang.llvm.org/Martinez G Feng W Gardner M CU2CL: a CUDA-to-OpenCL Translator for Multi- and Many-core Architectures http://eprints.cs.vt.edu/archive/00001161/01/CU2CL.pdfLLVM The LLVM compiler infrastructure http://llvm.org/Reaño C Peña AJ Silla F Duato J Mayo R Quintana-Orti ES CU2rCU: towards the complete rCUDA remote GPU virtualization and sharing solution Proceedings of the 19th International Conference on High Performance Computing (HiPC), Pune, India 2012 1 10NVIDIA The NVIDIA GPU Computing SDK Version 4, NVIDIA 2011Sandia National Labs LAMMPS molecular dynamics simulator http://lammps.sandia.gov/Citrix Systems, Inc. Xen http://xen.org/Peña AJ Virtualization of accelerators in high performance clusters Ph.D. Thesis, 2013NVIDIA CUDA profiler user's guide version 5, NVIDIA 2012Igual, F. D., Chan, E., Quintana-Ortí, E. S., Quintana-Ortí, G., van de Geijn, R. A., & Van Zee, F. G. (2012). The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations. Journal of Parallel and Distributed Computing, 72(9), 1134-1143. doi:10.1016/j.jpdc.2011.10.014Slurm workload manager http://slurm.schedmd.co

    Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] A clear trend has emerged involving the acceleration of scientific applications by using GPUs. However, the capabilities of these devices are still generally underutilized. Remote GPU virtualization techniques can help increase GPU utilization rates, while reducing acquisition and maintenance costs. The overhead of using a remote GPU instead of a local one is introduced mainly by the difference in performance between the internode network and the intranode PCIe link. In this paper we show how using the new InfiniBand Connect-IB network adapters (attaining similar throughput to that of the most recently emerged GPUs) boosts the performance of remote GPU virtualization, reducing the overhead to a mere 0.19% in the application tested.This work was funded by the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. This material is based upon work supported by the U. S. Department of Energy, Office of Science, Advanced Scientific Computing Research (SC-21), under Contract No. DE-AC02-06CH11357. Authors from the Universitat Politècnica de València and Universitat Jaume I are grateful for the generous support provided by Mellanox Technologies.Reaño González, C.; Silla Jiménez, F.; Peña Monferrer, AJ.; Shainer, G.; Schultz, S.; Castelló Gimeno, A.; Quintana Orti, ES.... (2014). Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0. En 2014 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. 266-267. doi:10.1109/CLUSTER.2014.6968737S26626

    SLURM Support for Remote GPU Virtualization: Implementation and Performance Study

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, and by the Fundacion Caixa-Castelló Bancaixa (Grant P11B2013-21).Iserte Agut, S.; Castello Gimeno, A.; Mayo Gual, R.; Quintana Ortí, ES.; Silla Jiménez, F.; Duato Marín, JF.; Reaño González, C.... (2014). SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. En Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE. 318-325. https://doi.org/10.1109/SBAC-PAD.2014.49S31832

    SLURM Support for Remote GPU Virtualization: Implementation and Performance Study

    No full text
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, and by the Fundacion Caixa-Castelló Bancaixa (Grant P11B2013-21).Iserte Agut, S.; Castello Gimeno, A.; Mayo Gual, R.; Quintana Ortí, ES.; Silla Jiménez, F.; Duato Marín, JF.; Reaño González, C.... (2014). SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. En Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE. 318-325. https://doi.org/10.1109/SBAC-PAD.2014.49S31832
    corecore